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We discuss finite volume errors in our calculations of Bk using improved staggered fermions on 
the MILC asqtad lattices. Using GPUs, we are now able to extrapolate using next-to-leading order 
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from pion loops. We find that the impact of FV fitting is very small, giving a 0.5% shift in the 
continuum limit. 
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1. Introduction 

The dominant error in our calculation of Bk using improved staggered quarks [JTJ] comes from 
our use of a truncated matching factor, but a significant subdominant error is that from extrapolating 
from finite to infinite volume. In our earlier work we estimated this error by comparing the results 
on two volumes. Here we describe an alternative estimate using next-to-leading order (NLO) chiral 
perturbation theory (ChPT). Specifically, we replace the pion loop integrals with their finite volume 
form, perform the chiral fit, and then use the fit parameters to determine the result in infinite volume. 
This method is fairly standard in chiral fits, but, given our large data set, has been too expensive 
to implement at the desired accuracy until recently. Using GPUs we can now incorporate the finite 
volume (FV) corrections into the fitting routines using SU (2) staggered ChPT. First results were 
presented in Ref. [gj, and here we present an update. 

2. Finite Volume Effects in SU(2) staggered chiral perturbation theory 

Finite volume corrections enter at NLO in SU(2) ChPT only through the chiral logarithms 
arising from loops of pions composed of valence d and d quarks. The standard chiral logarithmic 
functions that enter are 

£(X)=X[\og(X/nt> R ) + 8? v (X)) , (2.1) 
1{X) = -^ = - 1 °g( X /MDR)-l + 5 3 FV W, (2-2) 

where jUdr is the scale introduced by dimensional regularization, and X is squared mass (in physical 
units) of the dd pion. The functions 8[ V (X) and 8^ V (X) contain the finite volume corrections: 

^ 4 

S^iM 2 ) = 2^K (\n\ML), (2.4) 

where M is the pion mass, L is the box size in the spatial direction, K\ and Ko are modified Bessel 
functions, and n = (721,722, 723,714) is a image vector in 4-dimension lattice. The norm \n\ is 



\n\ = J 72^+72^ + 72^ + {^Y n ^j ( 2 -5) 

where Lj is the Euclidean temporal box size. 
The details are explained in Ref. 

3. Numerical Study 

In order to calculate the finite volume corrections 8f w and 5j v in Eqs. (2.3-2.4), we use the 



following criteria to truncate the sum over 72. For 5f v , with desired precision e = 1.0 x 10 14 
(double precision), we first determine r mSLX from 

[4*rLj x * l(r - ML) = e x [6K, (ML)] . (3. 1) 
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Here, 47ir 2 iax is the density of image vectors at \n\ = r max and 6Ky(ML) is the contribution to 5, FV 
from the first set of images with \n\ = 1. In words, we keep images out to a distance r max at 
which the contribution from a shell of radius Ar = 1 equals the desired precision times the leading 
contribution from \n\ = 1. We assume in this estimate that Lj » L (with Lj the extent in the 
temporal direction), so that we only consider spatial images. This is the source of the factor of 6 
multiplying K\(MV). Similarly for 8[ v , we define r max from 

[4nr^] x K Q (r mayL ML) >ex [6K (ML)] . (3.2) 

In the second step, we define spatial and temporal "radii" through 

L 

l~s — '"max; l~t — 7 * 'max • (3-3) 



Finally, when we calculate the finite volume corrections Eq. |2_3| and Eq. |2.4] , we include only 
images satisfying 

—r s <rij<r s for i = 1,2,3 

-r t <n 4 <r t (3.4) 

Therefore, the number of the image vectors n is essentially (2r v + l) 3 x (2r, + 1). 

To draw plots of Bk vs. pion mass-squared X, we calculate finite volume corrections for about 
hundred different mass values. The radius r max varies with X, but roughly we find we need, for 
100 different mass values in the relevant range, about 10 9 image vectors. Since there are about 
1000 configurations in each ensemble, we need about 10 12 evaluations of Bessel functions per 
ensemble. If we use a standard CPU to calculate finite volume corrections for all the MILC asqtad 
ensembles that we have data on, it takes about two months. This is clearly impractical, and we need 
significantly faster computational resources. GPUs provide the solution to this problem. 



4. CUDA Programming 

GPUs are composed of many tiny multi-processors which can handle the single instruction 
multiple data efficiently. We use Nvidia GTX480 GPUs which have a peak speed of 168 giga flops 
in double precision [gp. We use CUDA for GPU programming and obtain 64.3 giga flops (38% of 
the peak) in double precision. This is almost 120 times faster than the CPU code (0.5 giga flops). 
We use the following optimization techniques. 

• Substituting Division by Multiplication: 

Division is slower than multiplication in GPU calculation. For example, the division opera- 
tion x/4 is much slower than the multiplication operation x x 0.25. After this optimization, 
we get 16% gain in the speed. 

• Coalesced Access: 

Coalesced access allows sequential threads to access sequential GPU memories in parallel. 
Coalesced access is at least twice as fast as uncoalesced access. We find that including 
coalesced access in the global sum algorithm leads to a 20% gain. 



3 



Finite Volume Errors in Bk 



Jangho Kim 



5. Results 

We use MILC asqtad ensembles listed in Table |l[ They are generated with Nf ■ = 2 + 1 flavors 
of asqtad staggered sea quarks. The values of light sea quark masses {ami) and strange sea quark 

Table 1: MILC lattices used for the numerical study. Here, "ens" represents the number of gauge configu- 
rations, "meas" is the number of measurements per configuration, and ID will be used later to identify the 
corresponding lattice. 



a (fm) 


ami /am s 


size 


ens x meas 


ID 


0.12 


0.03/0.05 


20 3 x 64 


564x9 


CI 


0.12 


0.02/0.05 


20 3 x 64 


486x9 


C2 


0.12 


0.01/0.05 


20 3 x 64 


671 x 9 


C3 


0.12 


0.01/0.05 


28 3 x 64 


275 x 8 


C3-2 


0.12 


0.007/0.05 


20 3 x 64 


651 x 10 


C4 


0.12 


0.005/0.05 


24 3 x 64 


509x9 


C5 


0.09 


0.0062/0.031 


28 3 x 96 


995 x9 


Fl 


0.09 


0.0031/0.031 


40 3 x 96 


850 x 1 


F2 


0.06 


0.0036/0.018 


48 3 x 144 


744x2 


SI 


0.06 


0.0025/0.018 


56 3 x 144 


198 x 9 


S2 


0.045 


0.0028/0.014 


64 3 x 192 


705 x 1 


Ul 



masses (am s ) are given in Table We use four different lattice spacings: coarse (a = 0.12 fm), 
fine (a = 0.09 fm), superfine (a = 0.06 fm), and ultrafine (a = 0.045 fm) lattices. 

In our numerical study on Bk, we use HYP-smeared staggered fermions as valence quarks. 
HYP staggered fermions have a number of advantages such as reducing taste symmetry breaking 
as efficiently as HISQ action We use 10 different values of the valence quark masses (m x for 
the d quark and m y for the s) as given in Table ^[ 



Table 2: Valence quark masses (in lattice units). 



a (fm) 


am x and am y 








0.12 


0.005 x n 


with n = 


1,2,3,. 


..,10 


0.09 


0.003 x n 


with n = 


1,2,3,. 


..,10 


0.06 


0.0018 xn 


with n = 


1,2,3,. 


..,10 


0.045 


0.0014 x n 


with n = 


1,2,3,. 


..,10 



In Table [5J, we present our results for Bk with and without including finite volume (FV) terms 
in the fitting, as well as the difference between the two. Note that the differences are statistically 
significant despite the fact that the error in the individual results is larger than the difference. This 
is because the two fits are highly correlated. We find very small shifts, indicating that FV effects are 
a subpercent systematic. We also note that the impact of including FV corrections on our largest 
lattice (C3-2) is negligible, indicating that this volume is effectively infinite. 
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Table 3: Ba-(NDR, I /a) with finite volume corrections. The results are obtained by extrapolation to physical 
down quark mass and removing lattice artifacts due to taste breaking. The second column gives the results 
from extrapolation using the infinite volume SU(2) staggered ChPT form. The third column gives results 
from fitting to the FV form, and then removing the FV corrections from the final number. The last column 
gives the percentage change. The fit type is 4X3Y-NNLO of the SU(2) analysis, which is explained in 
Ref. [|l[j. am y is fixed to the heaviest quark mass (for example, am y = 0.05 for the C3 ensemble). 



ID 


B K 


B K (FV) 


AB K 


C3 


0.5734(46) 


0.5743(46) 


+0.16% 


C3-2 


0.5784(46) 


0.5785(46) 


+0.02% 


Fl 


0.5074(37) 


0.5049(37) 


-0.49% 


SI 


0.4914(65) 


0.4898(65) 


-0.33% 


Ul 


0.4812(65) 


0.4790(65) 


-0.46% 



We now display some of the fits that lead to these numbers. Figures |l(a)| , \l(b% 2(a% 2(b) 



and 3(a) show "X-fits" on the C3, C3-2, Fl, SI and Ul ensembles, respectively. The red line 
denotes fitting without finite volume corrections and the blue line denotes those with FV corrections 
included. The diamonds give Bk obtained, as explained above, by extrapolating m x — > w?jf ys , setting 
all pion taste-splittings to zero, and (in the case of the FV fit) setting L,Lj — > °°. 



0.61 



0.6 



0.59 -, 



0.58 



0.57 - - 



0.56 




0.05 0.1 0.15 

X p (<3eV 2 ) 

(a) C3 ensemble 



0.61 



0.6 



0.59 - 



0.58 



0.57 - 



0.56 



0.2 




0.05 



0.1 



0.15 



0.2 



(b) C3-2 ensemble 



Figure 1: Bk{1 /a) vs. X. The left figure shows results from the C3 ensemble, while the right figure shows 
results from the C3-2 ensemble. The fit type is 4X3Y-NNLO in the SU(2) analysis [Q]. We fix am y = 0.05. 
The red line represents the results of fitting with no finite volume correction. The blue line corresponds 
to those with finite volume corrections included. The diamonds correspond to the Bk value obtained by 
extrapolating m x to the physical light valence quark mass after setting all the pion taste-splittings to zero. 



Fig. 3 (b)| compares the continuum extrapolation with and without the finite volume corrections. 
The total correction in the continuum limit is 0.46%. 
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0.05 



0.1 



Xp (GeV 2 ) 
(a) Fl ensemble 



0.15 



0.52 



0.5 



0.48 




0.2 



0.05 0.1 0.15 

X p (GeV 2 ) 

(b) SI ensemble 



0.2 



Figure 2: Bk(1 /a) vs. X. The left figure shows results from the Fl ensemble and the right figure from the 
SI ensemble. The fit type is 4X-NNLO in the SU(2) analysis 




0.46 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 0.5 I 1 1 1 1 — 1 1 1 1 1 1 1 1 1 1 1 — 

0.05 0.1 0.15 0.2 0.5 1 1.5 

Xp (GeV 2 ) X p (GeV 2 ) 

(a) Ul ensemble (b) Scaling 

Figure 3: The left figure shows B K (l /a) vs. X for the Ul ensemble. The fit type is 4X-NNLO in the SU(2) 
analysis. The right figure shows Z?R-(2GeV) vs. a 2 . The red octagons show data obtained using the SU(2) 
fitting without the finite volume corrections. The blue crosses show results from SU(2) fitting with the FV 
corrections incorporated. Diamonds show the results after extrapolation to the continuum (a = 0) using the 
smallest three values of a. 
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6. Conclusion 

By using GPUs, we have significantly reduced the computational time for FV corrections in 
NLO chiral expressions. This has made it practical to fit and extrapolate using the FV-corrected 
forms. Using this method we have updated all our SU(2) staggered ChPT fits, with results reported 
in Ref. |@|. 

We find the FV effect to be at the subpercent level, although, as shown in the figures for the 
finest ensembles, FV effects would get much larger if we lowered the valence quark masses any 
further. 

Comparing the results from Table || from the C3 and C3-2 ensembles, we see that the FV shift 
on the C3 lattice is significantly smaller than difference between the central values from the two 
lattices. There is no inconsistency here because the errors on individual lattices are large enough 
that we cannot statistically distinguish between the results on the two volumes. Because of this, we 
think that the FV shift based on ChPT is a more reliable estimator of the FV systematic, and we 
use this in our updated results. 
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